---
title: ScrapeGraph
---

This guide provides a quick overview for getting started with ScrapeGraph [tools](/oss/integrations/tools/). For detailed documentation of all ScrapeGraph features and configurations head to the [API reference](https://python.langchain.com/docs/integrations/tools/scrapegraph).

For more information about ScrapeGraph AI:

- [ScrapeGraph AI Website](https://scrapegraphai.com)
- [Open Source Project](https://github.com/ScrapeGraphAI/Scrapegraph-ai)

## Overview

### Integration details

| Class | Package | Serializable | JS support | Version |
| :--- | :--- | :---: | :---: | :---: |
| [SmartScraperTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapegraph?style=flat-square&label=%20) |
| [SmartCrawlerTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapegraph?style=flat-square&label=%20) |
| [MarkdownifyTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapegraph?style=flat-square&label=%20) |
| [AgenticScraperTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapegraph?style=flat-square&label=%20) |
| [GetCreditsTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapegraph?style=flat-square&label=%20) |

### Tool features

| Tool | Purpose | Input | Output |
| :--- | :--- | :--- | :--- |
| SmartScraperTool | Extract structured data from websites | URL + prompt | JSON |
| SmartCrawlerTool | Extract data from multiple pages with crawling | URL + prompt + crawl options | JSON |
| MarkdownifyTool | Convert webpages to markdown | URL | Markdown text |
| GetCreditsTool | Check API credits | None | Credit info |

## Setup

The integration requires the following packages:

```python
%pip install --quiet -U langchain-scrapegraph
```

```output
Note: you may need to restart the kernel to use updated packages.
```

### Credentials

You'll need a ScrapeGraph AI API key to use these tools. Get one at [scrapegraphai.com](https://scrapegraphai.com).

```python
import getpass
import os

if not os.environ.get("SGAI_API_KEY"):
    os.environ["SGAI_API_KEY"] = getpass.getpass("ScrapeGraph AI API key:\n")
```

It's also helpful (but not needed) to set up [LangSmith](https://smith.langchain.com/) for best-in-class observability:

```python
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass()
```

## Instantiation

Here we show how to instantiate instances of the ScrapeGraph tools:

```python
from scrapegraph_py.logger import sgai_logger
import json

from langchain_scrapegraph.tools import (
    GetCreditsTool,
    MarkdownifyTool,
    SmartCrawlerTool,
    SmartScraperTool,
)

sgai_logger.set_logging(level="INFO")

smartscraper = SmartScraperTool()
smartcrawler = SmartCrawlerTool()
markdownify = MarkdownifyTool()
credits = GetCreditsTool()
```

## Invocation

### [Invoke directly with args](/oss/langchain/tools)

Let's try each tool individually:

### SmartCrawler Tool

The SmartCrawlerTool allows you to crawl multiple pages from a website and extract structured data with advanced crawling options like depth control, page limits, and domain restrictions.

```python
# SmartScraper
result = smartscraper.invoke(
    {
        "user_prompt": "Extract the company name and description",
        "website_url": "https://scrapegraphai.com",
    }
)
print("SmartScraper Result:", result)

# Markdownify
markdown = markdownify.invoke({"website_url": "https://scrapegraphai.com"})
print("\nMarkdownify Result (first 200 chars):", markdown[:200])

# SmartCrawler
url = "https://scrapegraphai.com/"
prompt = (
    "What does the company do? and I need text content from their privacy and terms"
)

# Use the tool with crawling parameters
result_crawler = smartcrawler.invoke(
    {
        "url": url,
        "prompt": prompt,
        "cache_website": True,
        "depth": 2,
        "max_pages": 2,
        "same_domain_only": True,
    }
)

print("\nSmartCrawler Result:")
print(json.dumps(result_crawler, indent=2))

# Check credits
credits_info = credits.invoke({})
print("\nCredits Info:", credits_info)
```

```output
SmartScraper Result: {'company_name': 'ScrapeGraphAI', 'description': "ScrapeGraphAI is a powerful AI web scraping tool that turns entire websites into clean, structured data through a simple API. It's designed to help developers and AI companies extract valuable data from websites efficiently and transform it into formats that are ready for use in LLM applications and data analysis."}

Markdownify Result (first 200 chars): [![ScrapeGraphAI Logo](https://scrapegraphai.com/images/scrapegraphai_logo.svg)ScrapeGraphAI](https://scrapegraphai.com/)

PartnersPricingFAQ[Blog](https://scrapegraphai.com/blog)DocsLog inSign up

Op
LocalScraper Result: {'company_name': 'Company Name', 'description': 'We are a technology company focused on AI solutions.', 'contact': {'email': 'contact@example.com', 'phone': '(555) 123-4567'}}

Credits Info: {'remaining_credits': 49679, 'total_credits_used': 914}
```

```python
# SmartCrawler example
from scrapegraph_py.logger import sgai_logger
import json

from langchain_scrapegraph.tools import SmartCrawlerTool

sgai_logger.set_logging(level="INFO")

# Will automatically get SGAI_API_KEY from environment
tool = SmartCrawlerTool()

# Example based on the provided code snippet
url = "https://scrapegraphai.com/"
prompt = (
    "What does the company do? and I need text content from their privacy and terms"
)

# Use the tool with crawling parameters
result = tool.invoke(
    {
        "url": url,
        "prompt": prompt,
        "cache_website": True,
        "depth": 2,
        "max_pages": 2,
        "same_domain_only": True,
    }
)

print(json.dumps(result, indent=2))
```

### [Invoke with ToolCall](/oss/langchain/tools)

We can also invoke the tool with a model-generated ToolCall:

```python
model_generated_tool_call = {
    "args": {
        "user_prompt": "Extract the main heading and description",
        "website_url": "https://scrapegraphai.com",
    },
    "id": "1",
    "name": smartscraper.name,
    "type": "tool_call",
}
smartscraper.invoke(model_generated_tool_call)
```

```output
ToolMessage(content='{"main_heading": "Get the data you need from any website", "description": "Easily extract and gather information with just a few lines of code with a simple api. Turn websites into clean and usable structured data."}', name='SmartScraper', tool_call_id='1')
```

## Chaining

Let's use our tools with an LLM to analyze a website:

<ChatModelTabs customVarName="llm" />

```python
# | output: false
# | echo: false

# %pip install -qU langchain langchain-openai
from langchain.chat_models import init_chat_model

llm = init_chat_model(model="gpt-4o", model_provider="openai")
```

```output
Note: you may need to restart the kernel to use updated packages.
```

```python
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig, chain

prompt = ChatPromptTemplate(
    [
        (
            "system",
            "You are a helpful assistant that can use tools to extract structured information from websites.",
        ),
        ("human", "{user_input}"),
        ("placeholder", "{messages}"),
    ]
)

llm_with_tools = llm.bind_tools([smartscraper], tool_choice=smartscraper.name)
llm_chain = prompt | llm_with_tools


@chain
def tool_chain(user_input: str, config: RunnableConfig):
    input_ = {"user_input": user_input}
    ai_msg = llm_chain.invoke(input_, config=config)
    tool_msgs = smartscraper.batch(ai_msg.tool_calls, config=config)
    return llm_chain.invoke({**input_, "messages": [ai_msg, *tool_msgs]}, config=config)


tool_chain.invoke(
    "What does ScrapeGraph AI do? Extract this information from their website https://scrapegraphai.com"
)
```

```output
AIMessage(content='ScrapeGraph AI is an AI-powered web scraping tool that efficiently extracts and converts website data into structured formats via a simple API. It caters to developers, data scientists, and AI researchers, offering features like easy integration, support for dynamic content, and scalability for large projects. It supports various website types, including business, e-commerce, and educational sites. Contact: contact@scrapegraphai.com.', additional_kwargs={'tool_calls': [{'id': 'call_shkRPyjyAtfjH9ffG5rSy9xj', 'function': {'arguments': '{"user_prompt":"Extract details about the products, services, and key features offered by ScrapeGraph AI, as well as any unique selling points or innovations mentioned on the website.","website_url":"https://scrapegraphai.com"}', 'name': 'SmartScraper'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 47, 'prompt_tokens': 480, 'total_tokens': 527, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_c7ca0ebaca', 'finish_reason': 'stop', 'logprobs': None}, id='run-45a12c86-d499-4273-8c59-0db926799bc7-0', tool_calls=[{'name': 'SmartScraper', 'args': {'user_prompt': 'Extract details about the products, services, and key features offered by ScrapeGraph AI, as well as any unique selling points or innovations mentioned on the website.', 'website_url': 'https://scrapegraphai.com'}, 'id': 'call_shkRPyjyAtfjH9ffG5rSy9xj', 'type': 'tool_call'}], usage_metadata={'input_tokens': 480, 'output_tokens': 47, 'total_tokens': 527, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})
```

## API reference

For detailed documentation of all ScrapeGraph features and configurations head to [the LangChain API reference](https://python.langchain.com/docs/integrations/tools/scrapegraph).

Or to [the official SDK repo](https://github.com/ScrapeGraphAI/langchain-scrapegraph).
